Method for processing data streams divided into a plurality of process steps

ABSTRACT

The present invention relates to a processing unit ( 100 ) and a method for processing a plurality of data streams by an algorithm divided into a plurality of Process Steps (PS) comprising: an interconnection unit ( 102 ) comprising means for switching, Process Step (PS) means ( 106 ) comprising at least two PS modules ( 106   a - 106   m ), each connected to the interconnection unit ( 102 ) and a scheduler ( 110 ) connected to said interconnection unit ( 102 ) and to each PS module ( 106   a - 106   m ), wherein said processing unit ( 100 ) comprises: a memory unit ( 108 ) comprising at least two memories ( 108   a - 108   n ) wherein each memory is connected to the interconnection unit ( 102 ); the interconnection unit ( 102 ) comprising further means for at least providing a first connection between one of said memories and one of said PS modules and a second connection between another of said memories and another of said PS modules, wherein the interconnection unit ( 102 ) is adapted to connect each memory to each of the PS modules by a switching activity, wherein the switching activity and the processing of the PS modules is controlled by the scheduler ( 110 ); and each memory comprises means for storing a data stream and said data streams are manipulated in parallel by the connected PS modules respectively, during a predetermined time period between said switching activities.

FIELD OF THE INVENTION

The present invention relates to a processing unit.

In particular, it relates to a processing unit and a method for resourceefficient processing and calculations of complex algorithms of multipledata streams.

BACKGROUND OF THE INVENTION

Implementation of a function comprising a complex algorithm, such as inspeech coding/decoding for a speech channel, requires a high number ofarithmetic operations such as multiplication, summation and subtraction,especially when several speech channels have to be processedsimultaneously. The data is normally processed in different steps, e.g.pre-scaling unit, low pass filter, high pass filter, voice activitydetector, code book search gain quantifier, post processors, etc. In aspeech coder, several channels have to be processed, i.e.encoded/decoded, during a limited time period. E.g, if K channels haveto be processed within L s, it is implied that a new channel has toenter a processing unit every L/K s. The functions processing eachchannel require a number of operations as mentioned above, and thefunctions may require a different number of clock cycles to performtheir operations. A problem is how to easily divide and group thefunctions to be able to perform the required operations, preferably inparallel, within a limited predetermined time period, and particularlywhen there exists a reference model in a software language (c, Pascaletc.). All the processing is normally independent manipulation of thedata stream.

Normally, implementations are performed by digital signal processingunits, which are running the software algorithm, or having amicroprocessor feeding an arithmetic unit with parallel data. Onlysimple algorithms are usually implemented directly in hardware without amicro processor.

U.S. Pat. No. 6,314,393 disclose a known method for performingprocessing in parallel. A parallel/pipeline VLSI architecture for acoder/decoder is described.

U.S. Pat. No. 6,201,488 shows a coder/decoder adapted to performdifferent algorithms. An algorithm is divided into smaller portions,called programs, where each program requires a program memory and aprocessor. One program operates on a data unit located on apredetermined memory position and it is not possible to perform paralleloperations. In addition, it is not possible to perform both a read and awrite operation during one clock cycle. The programs may requiredifferent time for their calculations and in order to performcalculations in cycles a waiting time (“idling operation”) isintroduced. The waiting time is used for swapping the data units.

The drawback with the solutions described above, is that it is notpossible to process a large number of data sets by time consuming andcomplex algorithm within an enough short time period.

Thus, an object of the present invention is to create a processing unitand a method adapted to process a plurality of data streams, e.g. aspeech channels, by an algorithm within a limited predetermined timeperiod.

SUMMARY OF THE INVENTION

The above-mentioned objects are achieved by the present inventionaccording to the independent claims by a method having the features ofclaim 1 and 9.

Preferred embodiments are set forth in the dependent claims.

An advantage with the present invention is that it provides a resourceeffective way of performing an algorithm in parallel without requiring aduplication of similar units. I.e. the present invention is inparticular suitable for a plurality of streams of data that requiresimilar processing, but not necessarily identical processing.

Another advantage with the present invention is that it is independentof the order in which the data streams are accessed. The process stepsare able to read or write in the memories within the memory unit inarbitrary order independent of other process steps as long as the endproduct is correct at the end of each process step when the switchingactivity occurs.

Another advantage with the present invention is that it provides a wayto place circuits on the unit in an advantageously way. By dividing analgorithm into process steps it facilitates placing of different unitsarranged for hardware implementations and signal routing, which areimportant for Application Specific Integrated Circuits (ASICs) and FieldProgrammable Gate Arrays (FPGAs). The present invention facilitatesseparation of an algorithm into separate circuits, where each circuitcorresponds to one process step. This is suitable for FPGAs that doesnot comprise as high gate capacity as an ASIC.

Another advantage with the present invention is that no micro processoris used which implies that no program memory is required. Thus allprocessing is performed by means of customized hardware.

Another advantage with the present invention is the number of movementsof data is reduced within the hardware and if the entire processing unitis implemented within a single circuit it is possible to use a memorywith one or several read and write ports allowing multiple read andwrite accesses during a single clock cycle.

Yet another advantage with present invention is that several channelsare processed simultaneously and periodic by the function.

A further advantage with the present invention is that it is suitablefor creating periodic data e.g. processing of multiple data streams indifferent applications.

A further advantage is that the present invention facilitates debuggingif a complex algorithm is divided into smaller process steps accordingto the invention. This division provides also a gain at the developmentof the process unit.

A further advantage with the present invention is that it comprisesdistributed separated memories. By using separated memories, it ispossible to adapt the location of the memories dependent of e.g. powerdistribution facilities.

BRIEF DESCRIPTION OF THE APPENDED DRAWINGS

FIG. 1 illustrates a processing unit according to the present invention.

FIG. 2 a-f illustrates a method according to the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Preferred embodiments of the present invention will now be describedwith reference to FIGS. 1 to 2. FIG. 1 shows a processing unit 100 inaccordance with the present invention. The processing unit 100 comprisesan interconnection unit 102 adapted to switch memory access signals. Theinterconnection unit 102 is preferable a space switch or a space rotator102, and the interconnection unit 102 is connected to a Processing means106 comprising at least two Process Step (PS) modules 106 a-106 m, to atleast two memories M1 108 a-108 n in a memory unit 108 wherein n denotesthe number of memories in the memory unit 108 and m denotes the numberof PS modules 106 a-m. At least one external memory 104 is connected toat least one PS provided that the PS controls the data movements. Itshould be noted that if the process steps do not control the datamovements, then the external memory is connected to the interconnectionunit and it is required that the number of memories exceeds the numberof PS by one or two. The external memory 104 is adapted to store e.g.input and output data of the processing unit 100. A scheduler 110 isconnected to the interconnection unit 102 and to each of the PS modules106 a-m. The scheduler 110 controls the interconnection unit 102 and thePS modules where it schedules the clock cycles. A PS module 106 a-m maybe implemented by means of a FPGA or an ASIC. As an alternative way, thescheduler 110 may be arranged within the interconnection unit 102. Thethe data manipulation steps belonging to a specific PS are performed inthe specific PS module 106 a-m. This is further described below.Different arithmetic operations are performed in each PS module 106 a-nand the PS modules are operated in parallel. Thus, the processing unitdoes not require a processor such as a Digital Signal Processor (DSP).

Process Step (PS)

According to the present invention, different functions where themanipulation of data is performed is extracted and a maximum and anaverage number of arithmetic operations that each function requires arecalculated, wherein a function is a number of data manipulation steps.At least one function is arranged into a group of functions which iscalled a Process Step (PS) P1-Pm. When a loop is repeated anundetermined number of times, all functions used within the single loopof manipulation steps, have to belong to one single PS. Additionally, itis not allowed to feedback data within a PS. However, when a loop isrepeated a predetermined number of times, manipulation steps located indifferent PS may be used within the loop. Preferably, the operationswithin one PS may have a substantial similar complexity.

Processing Unit

Each memory in the memory unit has preferably the same size. The size isdetermined by the PS that requires the most memory. The memory unit 108comprises at least an in-out memory and at least one processing memoryon which the PS operates. Preferably one additional memory is used as anexternal memory 104. The number of the external memories depends on theamount of data that is to be transferred to the memory and the number ofports of the memories. I.e. it may be one input/output external memoryor one input memory and one output memory. The external memory 104 isused for storing data between processing activities. All memories M1-Mnare connected to an interconnection unit 102 and the interconnectionunit 102 is always active and interconnects each PS P1-Pm to all memorysignals of a respective memory M1-Mn in such a way that each PS P1-Pm isconnected to a single memory M1-Mn in the memory unit 106. Theinterconnection unit 102 is adapted to switch the respective PS from arespective first memory 108 a to a respective second memory 108 b withinone clock cycle at a time point indicated by a scheduler 110. Thescheduler 110 controls the interconnection unit 102 and the PS modules106 a-n. Furthermore, the scheduler 110 informs the PS modules when thePS modules are allowed to start to access memories and allowed to starttheir processing.

The scheduler 110 schedules the actions of the interconnection unit bygiving activation orders. During the time between the activation orders(from the scheduler) a PS performs its portion of the algorithm whichincludes read and write accesses towards the memory within the memoryunit that it currently is interconnected to. The number of concurrentread and write accesses during one single clock cycle depends of thenumber of access ports of the memory. I.e. if the memory has 1 read portand 1 write port, a read and a write access may be performed during onesingle clock cycle, while a memory with a common read and write portwould require two cycles for the same access sequence.

When the process step performs its calculation and data transferoperation, it may perform the access in any order and memory positionduring its processing period as long as the process step produces thesame end product (provided that the same memory content is used) at theend of the period. This is provided that the memory comprises at leasttwo ports; one read port and one write port. However, there also existother types of memories comprising e.g. a single read/write port, onewrite port and two read ports. Naturally, it is possible to select theseother types of memories but the selected memory type may influence thepossible read/write capacity during one clock cycle.

Processing

If K data streams/channels are to be processed within L seconds, then anew data stream/channel enters the processing unit 100 every L/Kseconds. I.e. the processing of each PS 106 a-n is limited to L/Kseconds, and the entire data stream is processed within L*m/K secondswhere m is the number of PS.

If the units, which transfer the data (e.g. a channel) between theexternal memory 104 and the internal memories 108 a-n within the memoryunit 108 are considered as one or more PS's, the number of PS is equalto the number of internal memories 108 a-n. I.e. the first PS transfersdata from the external memory 104 to an internal memory 108 a-n withinthe memory unit 108 and the last PS transfers data from an internalmemory 108 a-n within the memory unit 108. If the memories 108 a-ncomprises more than one port, or if there exists enough cycles toperform input and output transfers in one sequence, it is possible tomerge the first and last PS into one combined input and output PS.

In the example below illustrated in FIG. 2 a-2 f it is assumed that thenumber of data streams/channels are K, Ch1-ChK, and n=4 and m=4, thereexists thus four memories, M1, M2, M3 and M4, and four PS, P1-P4 whereinthe first PS, P1, collects data form the external memory to an internalmemory and the last PS, P4 collects data from an internal memory to theexternal memory. All channels have to be processed within L seconds thatimplies that a new channel enters the processing unit every L/K secondsand preferably, another channel leaves the processing unit every L/Kseconds. Hence, each PS has a maximum allowed time of L/K=M. However,the PS modules do not have to utilise the entire maximum allowed time,i.e. each PS module is allowed to use at most M clock cycles.

In FIGS. 2 a-2 f a processing unit comprising an interconnection unit102 connected to a memory unit 208 comprising four memories M1-M4, anexternal memory 204, process step means 206 comprising PS modules P1-P4and a scheduler 210 that is further connected to said process stepmeans. FIG. 2 a-2 f illustrate the procedure when a number of datastreams, e.g. a number of speech channels, are processed by theprocessing unit.

FIG. 2 a: M1 is connected to P1 and P1 performs its operation, i.e.collects data (Ch1) from the external memory to M1 during a number ofclock cycles p (wherein p≦M).

FIG. 2 b: After M clock cycles, the scheduler 210 orders theinterconnection unit 202 to perform a switching activity which resultsin that M1 is now connected to P2 and M2 is connected to P1. P1 performsits operations on M2 during p clock cycles, i.e. collecting data (Ch2)from the external memory to M2, and simultaneously, P2 performs itsoperations on M1 during q clock cycles (q<M).

FIG. 2 c: After another M clock cycles, the interconnection unit 102performs a switching activity which results in that M1 is now connectedto P3, M2 is connected to P2 and M3 is connected to P1. P3 performs itsoperations on M1 during r clock cycles (r≦M) and simultaneously, P2performs its operations on M2 during q clock cycles and P1 performs itsoperation, i.e. collects data (Ch3) from the external memory to M3,during p clock cycles.

FIG. 2 d: After yet another M clock cycles, the interconnection unit 102performs a switching activity which results in that M1 is now connectedto P4, M2 is connected to P3, M3 is connected to P2 and M4 is connectedto P1. P4 performs its operations on M1, i.e. collects data (theprocessing of Ch1 is now completed) from M1 to the external memoryduring s clock cycles and simultaneously, P3 performs its operations onM2 during r clock cycles, P2 performs its operation on M3 and P1performs its operation on M4, i.e. collects data (Ch4) from the externalmemory to M4.

FIG. 2 e: After yet another M clock cycles, the interconnection unit 102performs a switching activity which results in that M1 is now connectedto P1, M2 is connected to P4, M3 is connected to P3 and M4 to P2. P1performs its operations on M1, i.e. collects data (Ch5) from theexternal memory to M1 and simultaneously, P2 performs its operations onM4, P3 performs its operation on M3 and P4 performs its operation on M2,i.e. collects data (the processing of Ch2 is now completed) from M2 tothe external memory.

FIG. 2 f: After yet another M clock cycles, the interconnection unit 102performs a switching activity which results in that M1 is now connectedto P2, M2 is connected to P1, M3 is connected to P4 and M4 to P3. P2performs its operations on M1 and simultaneously, P3 performs itsoperations on M4, P4 performs its operation on M3 i.e. collects data(the processing of Ch3 is now completed) from M3 to the external memoryand P1 performs its operation on M2, i.e. collects data (Ch6) from theexternal memory to M2.

Hence, this procedure is repeated in a cyclic way and continues untilsubstantially all N data streams/channels have been processed by P1-P4respectively. However it is not required that all PS's are active duringthe entire session. E.g., if the data stream consists of a channelcontaining speech that is located in one memory, this channel is notprocessed by a PS that is handling comfort noise. This particular PS ishowever connected to the memory containing the data stream, although noprocessing is performed. It should also be noted that the number ofclock cycles denoted as p, q etc. are not fixt. The number depends ofthe type of data within the data stream/channel. However, it is requiredthat the number is less or equal to M.

Interconnection

A memory unit comprises one or several memories. Each memory comprises acontrol bus, one or several address busses and one or several read/writedata busses. Each PS has a connection to exactly one of those memories.The connection is handled by the interconnection unit. At a beginning ofa time period, each PS is switched to another memory by theinterconnection unit. The interconnection unit switches all the memorysignals such as read/write data, control and address busses from thefirst PS to the next PS. During that time period a memory is onlyconnected to one process step.

Memory Structure

The memory area may be divided for storing four groups of data:

-   -   constant data, used during the session,    -   session data: data that is used and produced during the session        and stored between the channel is switched in and out from an        internal memory, to the external memory,    -   global process steps data: data that is used in several PS's and        passes from a one PS to another PS and    -   local process steps data: data that is used temporary within one        PS.

Furthermore, each clock cycle may belong to one of two phases, providedthat the memories in the memory unit comprise one single port: In afirst phase, the data may be moved every second half cycles to and fromthe interconnection unit and a second phase may be used for internalupdates within the PS (P1-Pm).

The present invention is not limited to the above-described preferredembodiments. Various alternatives, modifications and equivalents may beused. Therefore, the above embodiments should not be taken as limitingthe scope of the invention, which is defined by the appending claims.

1-12. (canceled)
 13. A processing unit (PA) for processing a pluralityof data streams by an algorithm divided into a plurality of processsteps, said PA comprising: an interconnection unit comprising means forswitching; Process Step (PS) means comprising at least two PS modules,where each PS module is connected to the interconnection unit and ascheduler connected to said interconnection unit and to each PS module;a memory unit comprising at least two memories wherein each memory isconnected to the interconnection unit; the interconnection unit furthercomprising means for providing at least a first connection between oneof said memories and one of said PS modules and a second connectionbetween another of said memories and another of said PS modules, whereinthe interconnection unit is adapted to connect each memory to each ofthe PS modules by a switching activity, wherein the switching activityand the processing of the PS modules are controlled by the scheduler;and each memory comprises means for storing a data stream and saidstored data streams are manipulated in parallel by the connected PSmodules respectively, during a predetermined time period between saidswitching activities.
 14. The Processing Unit (PA) according to claim13, further comprising at least one external memory for storing at leastinput and output data for the memories within the memory unit.
 15. TheProcessing Unit (PA) according to claim 13, wherein said data streamsare channels in a communication system.
 16. The Processing Unit (PA)according to claim 13, wherein said channels are speech channels andsaid PA is implemented in a speech coder.
 17. The Processing Unit (PA)according to claim 13, wherein said process step modules are implementedby means of hardware suitable for the algorithm.
 18. The Processing Unit(PA) according to claim 13, wherein at least one of the PS modulestransfer data between the external memory and any of the memories withinthe memory unit.
 19. A method for processing a plurality of data streamsby an algorithm divided into a plurality of Process Steps (PS) by usingan interconnection unit comprising means for switching, Process Step(PS) means comprising at least two PS modules, each connected to theinterconnection unit and a scheduler connected to said interconnectionunit and to each PS module, said method comprising the steps of:connecting at least two memories within a memory unit to theinterconnection unit; providing by the interconnection unit a firstconnection between one of said memories and one of said PS modules and asecond connection between another of said memories and another of saidPS modules, wherein the interconnection unit is adapted to connect eachmemory to each of the PS modules by a switching activity, wherein theswitching activity and the processing of the PS modules are controlledby the scheduler; storing a data stream in each memory, and manipulatingsaid data streams in parallel by the connected PS modules respectively,during a predetermined time period between said switching activities.20. The method according to claim 19, wherein the method comprises thefurther step of storing at least input and output data for the memorieswithin the memory unit at the at least one external memory.
 21. Themethod according to claim 19, wherein said data streams are channels ina communication system.
 22. The method according to claim 21, whereinsaid channels are speech channels and that said processing unit isimplemented in a speech coder.
 23. The method according to claim 19,wherein said process step modules are implemented by means of hardwaresuitable for the algorithm.
 24. The method according to claim 19,wherein at least one of the PS modules transfers data between theexternal memory and any of the memories within the memory unit.